Improving NMF clustering by leveraging contextual relationships among words

نویسندگان

چکیده

Non-negative Matrix Factorization (NMF) and its variants have been successfully used for clustering text documents. However, NMF approaches like other models do not explicitly account the contextual dependencies between words. To remedy this limitation, we draw inspiration from neural word embedding posit that words frequently co-occur within same context (e.g., sentence or document) are likely related to each in some semantic aspect. We then propose jointly factorize document-word word-word co-occurrence matrices. The decomposition of latter matrix encourages co-occurring similar latent representations thereby reflecting relationships among them. Empirical results, on several real-world datasets, provide strong support benefits our approach. Our main finding is can drastically improve performance by leveraging explicitly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Refinement of Document Clustering by Using NMF

In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cu...

متن کامل

Improving Newsgroup Clustering by Filtering Author-Specific Words

Introduction. This paper describes the first step in a project for topic identification in help-desk applications. In this step, we apply a clustering mechanism to identify the topics of newsgroup discussions. We have used newsgroup discussions as our testbed, as they provide a good approximation to our target application, while obviating the need for manual tagging of topics. We have found tha...

متن کامل

"clustering Words" Clustering Words Clustering Words

متن کامل

KEY WORDS-Statistical Parsing, Grammar Acquisition, Clustering Analysis, Local Contextual

This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...

متن کامل

Shifted Nmf with Group Sparsity for Clustering Nmf Basis Functions

Recently, Non-negative Matrix Factorisation (NMF) has found application in separation of individual sound sources. NMF decomposes the spectrogram of an audio mixture into an additive parts based representation where the parts typically correspond to individual notes or chords. However, there is a need to cluster the NMF basis functions to their sources. Although, many attempts have been made to...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neurocomputing

سال: 2022

ISSN: ['0925-2312', '1872-8286']

DOI: https://doi.org/10.1016/j.neucom.2022.04.122